Miscellaneous functions for working with Dimodal results.
midquantile(x, q=((1:length(x))-1)/(length(x)-1), type=0L, feps=0.0)
runs.as.rle(runs, x)
select.peaks(pk)
center.diw(m)
match.features(m, near=10, foverlap=0.70, nomatch=NA_integer_, quiet=FALSE)
shiftID.place(feat, offset, xmid, midoff)midquantile returns a vector the same length as the data. Quantiles
outside the range [0,1] return the first or last data point, even if this is
discontinuous with the values at 0 or 1. In other words, the function does
not follow the piecewise linear segment outside the valid range, but clips
it. NA or NaN quantiles propagate.
runs.as.rle returns a list of class "rle" with members
"lengths" and "values", as per the rle command. It also
adds a member "nskip" with the number of non-finite values in the data
within the run.
select.peaks returns a subset of the argument, possibly with zero
rows. If the argument is not a "Dipeak" object, it returns a dummy
empty object.
center.diw returns its argument with modified diw.peaks and
diw.flats, if they exist.
match.features returns a list with four elements. "peak.lp2diw"
is a vector with one element per row in lp.peaks whose value is the matching
row number in diw.peaks, or nomatch if there is no match or the
lp.peaks row is not a valid peak. "peak.diw2lp" is a similar map
from diw.peaks to lp.peaks. "flat.lp2diw" and
"flat.diw2lp" are the equivalent maps for flats.
shiftID.place returns the modified feat data frame.
a "Dimodal" object returned from Dimodal
the original data, with the same length as the members of runs
the list returned from find.runs
a "Dipeak" object
maximum distance in points between matching peaks, or as a fraction of the length of the original data
minimum fraction of the length of either flat that the common segment must cover
value to use when a feature has no match in the other spacing, treated as integer internally
a boolean, TRUE to only determine the matching, FALSE to also print the aligned features
quantile(s) for mid-quantile approximation, by default at the data indices
algorithm determining segments approximating x, an integer from 0 to 4 as described in Details
tolerance for matching values, per find.runs
a "Dipeak", "Diflat", or "Dicpt" data frame
an integer, the amount to shift position of peaks or points or endpoints of flats
a vector of interpolated quantiles to convert indices back to raw data,
as stored in the "Didata"
an integer, the amount to shift positions in addition to offset
The midquantile function approximates the quantile function by
replacing the steps of the ecdf distribution with piecewise linear
segments; see Ma, Genton, and Parzen (2011). This creates a ramp over tied
or discrete values, giving a better estimate of the position of features,
especially when there are large gaps between modes and few or no data points
within them. The function determines the segment endpoints and by default
evaluates them on the original data grid, scaling the vector indices to run
from 0 through 1. It first converts the data to runs using find.runs,
with feps defining ties. Segmentation type 1 is the mid-distribution
function of Ma, with the data value at the ends of runs shifted to the middle
of the change. Segmentation type 2 instead shifts the quantiles by half an
index, extending the step in the ecdf. These two approaches can
create an envelope around the quantile function, with the type 1 offset from
the data at q = 0 and the type 2 at q = 1. Segmentation type 3 combines both
shifts, interpolating on a half grid for both x and q. It follows the
quantile function better, but does round off the curve at single data points.
In practice types 1 and 3 are close. Type 4 runs segments between the middle
of runs, or through the data points when there are none. This reduces its
estimation error, but the strategy does assume that the step in data to
either side of the run is about the same. If not, the other approximations
would move away from the center of the run.
Type 4 is best when the data has very few ties. Use types 3 or 1 when there are. Type 0 will automatically select the strategy, using 3 when there are ties and 4 when not. It uses a simple check, whether the number of unique values is a tenth of the data, to decide if there are enough ties.
Internally the function makes two calls C-side.
.Call("C_midq", x, type, feps, PACKAGE="Dimodal") returns a vector
with the piecewise linear segments, with $x the endpoints along the
data and $q along the quantiles.
.Call("C_eval_midq", pts$x, pts$q, q, PACKAGE="Dimodal") uses these
segments as the first two arguments and new quantiles as the third to
interpolate data values.
The find.runs returned value has two vectors with the length of the
data. One has non-zero values at the start of runs, the other counts skipped
invalid points. The "rle" class is more compact, storing only the runs
and the data values at the start. The runs.as.rle function does this
compaction.
find.peaks returns a data frame with not only maxima but also the
minima between them. It includes maxima even if they are at the first or last
point, with minima to only one side. select.peaks selects only
those peaks surrounded by minima. It may return a "Dipeak" object with
no rows. pk need not include the modifications from Dimodal;
select.peaks keeps all columns of its argument.
Indexing in interval spacing is at the end of the interval but the low-pass
filter is centered. center.diw shifts the interval spacing features
to align with the data, including peak positions, flat source identifiers,
and flat start and end points. Note that the raw value is already shifted
when set by Dimodal and will not change.
match.features aligns peaks and flats between the low-pass and
interval spacing. It compares only valid maxima, as per find.peaks,
and shifts interval spacing positions with center.diw before matching
them. Peaks must lie within near points to match. Flats must overlap,
and the common segment must be at least foverlap of the length of
either flat. The function prints the position, raw value, and the number of
tests that have passed their acceptance level, unless quiet is TRUE.
The nomatch value is cast internally to an integer and cannot be
between 1 and the number of features in either spacing, to prevent conflicts.
NA, 0, and negative values are acceptable.
The shiftID.place function is used in Dimodal to modify the
placement of features, and is provided separately if the detectors
find.peaks, find.flats, or find.cpt are called
directly. It adds offset to the columns "pos", "stID",
"endID", "lminID", "rminID", "lsuppID", and "rsuppID"
if they exist in the features data frame to account for values skipped
during filtering. Use the "lp.stID" attribute for low-pass features,
"diw.stID" for interval spacing, and 2 for changepoints. If pos, stID,
or endID are in the data frame the function also adds columns "x",
"xst", and "xend" respectively with the original data value for
the index by using the midquantile result xmid. Here the index is
further modified by midoff; use 0 for low-pass features and
changepoints, and half the interval width, stored as attribute "wdiw".
Y. Ma, M. Genton, E. Parzen (2011), Asymptotic properties of sample quantiles of discrete distributions. Ann Inst Stat Math 63, pp. 227--243.
m <- Dimodal(faithful$eruptions, Diopt.local(analysis=c('lp','diw')))
# How many peaks were found? Use print.data.frame to see the full structure.
nrow(select.peaks(m$lp.peaks))
nrow(select.peaks(m$diw.peaks))
# Compare to m$diw.peaks.
m$diw.peaks
center.diw(m)$diw.peaks
# Flats do not match because the Diw feature only covers 50% of the LP.
match.features(m)
plot(sort(iris$Petal.Length))
lines(midquantile(iris$Petal.Length, type=1L), col='red')
lines(midquantile(iris$Petal.Length, type=2L), col='blue')
lines(midquantile(iris$Petal.Length, type=3L), col='green')
lines(midquantile(iris$Petal.Length, type=4L), col='orange')
# See the Dimodal.R source code for the use of shiftID.place.
# To simplify the runs in the signed difference of the interval spacing
# runs.as.rle(Dimodal:::find.runs(m$data['signed',], 0.01), m$data['signed',])
Run the code above in your browser using DataLab